46 research outputs found

    Towards Practical Privacy-Preserving Protocols

    Get PDF
    Protecting users' privacy in digital systems becomes more complex and challenging over time, as the amount of stored and exchanged data grows steadily and systems become increasingly involved and connected. Two techniques that try to approach this issue are Secure Multi-Party Computation (MPC) and Private Information Retrieval (PIR), which aim to enable practical computation while simultaneously keeping sensitive data private. In this thesis we present results showing how real-world applications can be executed in a privacy-preserving way. This is not only desired by users of such applications, but since 2018 also based on a strong legal foundation with the General Data Protection Regulation (GDPR) in the European Union, that forces companies to protect the privacy of user data by design. This thesis' contributions are split into three parts and can be summarized as follows: MPC Tools Generic MPC requires in-depth background knowledge about a complex research field. To approach this, we provide tools that are efficient and usable at the same time, and serve as a foundation for follow-up work as they allow cryptographers, researchers and developers to implement, test and deploy MPC applications. We provide an implementation framework that abstracts from the underlying protocols, optimized building blocks generated from hardware synthesis tools, and allow the direct processing of Hardware Definition Languages (HDLs). Finally, we present an automated compiler for efficient hybrid protocols from ANSI C. MPC Applications MPC was for a long time deemed too expensive to be used in practice. We show several use cases of real-world applications that can operate in a privacy-preserving, yet practical way when engineered properly and built on top of suitable MPC protocols. Use cases presented in this thesis are from the domain of route computation using BGP on the Internet or at Internet Exchange Points (IXPs). In both cases our protocols protect sensitive business information that is used to determine routing decisions. Another use case focuses on genomics, which is particularly critical as the human genome is connected to everyone during their entire lifespan and cannot be altered. Our system enables federated genomic databases, where several institutions can privately outsource their genome data and where research institutes can query this data in a privacy-preserving manner. PIR and Applications Privately retrieving data from a database is a crucial requirement for user privacy and metadata protection, and is enabled amongst others by a technique called Private Information Retrieval (PIR). We present improvements and a generalization of a well-known multi-server PIR scheme of Chor et al., and an implementation and evaluation thereof. We also design and implement an efficient anonymous messaging system built on top of PIR. Furthermore we provide a scalable solution for private contact discovery that utilizes ideas from efficient two-server PIR built from Distributed Point Functions (DPFs) in combination with Private Set Intersection (PSI)

    Lessons Learned: Defending Against Property Inference Attacks

    Full text link
    This work investigates and evaluates multiple defense strategies against property inference attacks (PIAs), a privacy attack against machine learning models. Given a trained machine learning model, PIAs aim to extract statistical properties of its underlying training data, e.g., reveal the ratio of men and women in a medical training data set. While for other privacy attacks like membership inference, a lot of research on defense mechanisms has been published, this is the first work focusing on defending against PIAs. With the primary goal of developing a generic mitigation strategy against white-box PIAs, we propose the novel approach property unlearning. Extensive experiments with property unlearning show that while it is very effective when defending target models against specific adversaries, property unlearning is not able to generalize, i.e., protect against a whole class of PIAs. To investigate the reasons behind this limitation, we present the results of experiments with the explainable AI tool LIME. They show how state-of-the-art property inference adversaries with the same objective focus on different parts of the target model. We further elaborate on this with a follow-up experiment, in which we use the visualization technique t-SNE to exhibit how severely statistical training data properties are manifested in machine learning models. Based on this, we develop the conjecture that post-training techniques like property unlearning might not suffice to provide the desirable generic protection against PIAs. As an alternative, we investigate the effects of simpler training data preprocessing methods like adding Gaussian noise to images of a training data set on the success rate of PIAs. We conclude with a discussion of the different defense approaches, summarize the lessons learned and provide directions for future work

    Association of asthma severity and educational attainment at age 6-7 years in a birth cohort : Population based record-linkage study

    Get PDF
    Acknowledgments: We are very grateful to David Fone for help in the concept of the study. SP received a Translational Health Research Platform Award from the National Institute for Social Care and Health Research (grant reference: TPR08- 15 006) for the development of WECC.JD was supported by The Centre for the Development and Evaluation of Complex Interventions for Public Health Improvement (DECIPHer), a UKCRC Public Health Research: Centre of Excellence. Funding from the British Heart Foundation, Cancer Research UK, Economic and Social Research Council (RES-590-28-0005), Medical Research Council, the Welsh Government and the Wellcome Trust (WT087640MA), under the auspices of the UK Clinical Research Collaboration, is gratefully acknowledged. SP was an applicant in the Centre for the Improvement of Population Health through E-records Research (CIPHER), one of four UK e-health Informatics Research Centres funded by a joint investment from: Arthritis Research UK, the British Heart Foundation, Cancer Research UK, the Chief Scientist Office (Scottish Government Health Directorates), the Economic and Social Research Council, the Engineering and Physical Sciences Research Council, the Medical Research Council,the National Institute for Health Research, the National Institute for Social Care and Health Research (Welsh Government) and the Wellcome Trust (Grant reference: 7 MR/K006525/1).Peer reviewe

    MP2ML: A Mixed-Protocol Machine Learning Framework for Private Inference

    Get PDF
    Privacy-preserving machine learning (PPML) has many applications, from medical image classification and anomaly detection to financial analysis. nGraph-HE enables data scientists to perform private inference of deep learning (DL) models trained using popular frameworks such as TensorFlow. nGraph-HE computes linear layers using the CKKS homomorphic encryption (HE) scheme. The non-polynomial activation functions, such as MaxPool and ReLU, are evaluated in the clear by the data owner who obtains the intermediate feature maps. This leaks the feature maps to the data owner from which it may be possible to deduce the DL model weights. As a result, such protocols may not be suitable for deployment, especially when the DL model is intellectual property. In this work, we present MP2ML, a machine learning framework which integrates nGraph-HE and the secure two-party computation framework ABY, to overcome the limitations of leaking the intermediate feature maps to the data owner. We introduce a novel scheme for the conversion between CKKS and secure multi-party computation to execute DL inference while maintaining the privacy of both the input data and model weights. MP2ML is compatible with popular DL frameworks such as TensorFlow that can infer pre-trained neural networks with native ReLU activations. We benchmark MP2ML on the CryptoNets network with ReLU activations, on which it achieves a throughput of 33.3 images/s and an accuracy of 98.6%. This throughput matches the previous state-of-the-art work, even though our protocol is more accurate and scalable

    Secure Similar Sequence Query on Outsourced Genomic Data

    Get PDF
    The growing availability of genomic data is unlocking research potentials on genomic-data analysis. It is of great importance to outsource the genomic-analysis tasks onto clouds to leverage their powerful computational resources over the large-scale genomic sequences. However, the remote placement of the data raises personal-privacy concerns, and it is challenging to evaluate data-analysis functions on outsourced genomic data securely and efficiently. In this work, we study the secure similar-sequence-query (SSQ) problem over outsourced genomic data, which has not been fully investigated. To address the challenges of security and efficiency, we propose two protocols in the mixed form, which combine two-party secure secret sharing, garbled circuit, and partial homomorphic encryptions together and use them to jointly fulfill the secure SSQ function. In addition, our protocols support multi-user queries over a joint genomic data set collected from multiple data owners, making our solution scalable. We formally prove the security of protocols under the semi-honest adversary model, and theoretically analyze the performance. We use extensive experiments over real-world dataset on a commercial cloud platform to validate the efficacy of our proposed solution, and demonstrate the performance improvements compared with state-of-the-art works

    Ferret: Fast Extension for coRRElated oT with small communication

    Get PDF
    Correlated oblivious transfer (COT) is a crucial building block for secure multi-party computation (MPC) and can be generated efficiently via OT extension. Recent works based on the pseudorandom correlation generator (PCG) paradigm presented a new way to generate random COT correlations using only communication sublinear to the output length. However, due to their high computational complexity, these protocols are only faster than the classical IKNP-style OT extension under restricted network bandwidth. In this paper, we propose new COT protocols in the PCG paradigm that achieve unprecedented performance. With 50 Mbps network bandwidth, our maliciously secure protocol can produce one COT correlation in 22 nanoseconds. More specifically, our results are summarized as follows: - We propose a semi-honest COT protocol with sublinear communication and linear computation. This protocol assumes primal-LPN and is built upon a recent VOLE protocol with semi-honest security by Schoppmann et al. (CCS 2019). We are able to apply various optimizations to reduce its communication cost by roughly 15x, not counting a one-time setup cost that diminishes as we generate more COTs. - We strengthen our COT protocol to malicious security with no loss of efficiency. Among all optimizations, our new protocol features a new checking technique that ensures correctness and consistency essentially for free. In particular, our maliciously secure protocol is only 1-3 nanoseconds slower for each COT. - We implemented our protocols, and the code will be publicly available at EMP-toolkit. We observe at least 9x improvement in running time compared to the state-of-the-art protocol by Boyle et al. (CCS 2019) in both semi-honest and malicious settings under any network faster than 50 Mbps. With this new record of efficiency for generating COT correlations, we anticipate new protocol designs and optimizations will flourish on top of our protocol

    QUOTIENT: Two-Party Secure Neural Network Training and Prediction

    Get PDF
    Recently, there has been a wealth of effort devoted to the design of secure protocols for machine learning tasks. Much of this is aimed at enabling secure prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs are trained on data, a key question is how such models can be also trained securely. The few prior works on secure DNN training have focused either on designing custom protocols for existing training algorithms, or on developing tailored training algorithms and then applying generic secure protocols. In this work, we investigate the advantages of designing training algorithms alongside a novel secure protocol, incorporating optimizations on both fronts. We present QUOTIENT, a new method for discretized training of DNNs, along with a customized secure two-party protocol for it. QUOTIENT incorporates key components of state-of-the-art DNN training such as layer normalization and adaptive gradient methods, and improves upon the state-of-the-art in DNN training in two-party computation. Compared to prior work, we obtain an improvement of 50X in WAN time and 6% in absolute accuracy

    Oxygen stable isotope ratios from British oak tree-rings provide a strong and consistent record of past changes in summer rainfall

    Get PDF
    United Kingdom (UK) summers dominated by anti-cyclonic circulation patterns are characterised by clear skies, warm temperatures, low precipitation totals, low air humidity and more enriched oxygen isotope ratios (δ18O) in precipitation. Such conditions usually result in relatively more positive (enriched) oxygen isotope ratios in tree leaf sugars and ultimately in the tree-ring cellulose formed in that year, the converse being true in cooler, wet summers dominated by westerly air flow and cyclonic conditions. There should therefore be a strong link between tree-ring δ18O and the amount of summer precipitation. Stable oxygen isotope ratios from the latewood cellulose of 40 oak trees sampled at eight locations across Great Britain produce a mean δ18O chronology that correlates strongly and significantly with summer indices of total shear vorticity, surface air pressure, and the amount of summer precipitation across the England and Wales region of the United Kingdom. The isotope-based rainfall signal is stronger and much more stable over time than reconstructions based upon oak ring widths. Using recently developed methods that are precise, efficient and highly cost-effective it is possible to measure both carbon (δ13C) and oxygen (δ18O) isotope ratios simultaneously from the same tree-ring cellulose. In our study region, these two measurements from multiple trees can be used to reconstruct summer temperature (δ13C) and summer precipitation (δ18O) with sufficient independence to allow the evolution of these climate parameters to be reconstructed with high levels of confidence. The existence of long, well-replicated oak tree-ring chronologies across the British Isles mean that it should now be possible to reconstruct both summer temperature and precipitation over many centuries and potentially millennia

    CrypTFlow2: Practical 2-Party Secure Inference

    Get PDF
    We present CrypTFlow2, a cryptographic framework for secure inference over realistic Deep Neural Networks (DNNs) using secure 2-party computation. CrypTFlow2 protocols are both correct -- i.e., their outputs are bitwise equivalent to the cleartext execution -- and efficient -- they outperform the state-of-the-art protocols in both latency and scale. At the core of CrypTFlow2, we have new 2PC protocols for secure comparison and division, designed carefully to balance round and communication complexity for secure inference tasks. Using CrypTFlow2, we present the first secure inference over ImageNet-scale DNNs like ResNet50 and DenseNet121. These DNNs are at least an order of magnitude larger than those considered in the prior work of 2-party DNN inference. Even on the benchmarks considered by prior work, CrypTFlow2 requires an order of magnitude less communication and 20x-30x less time than the state-of-the-art

    Talek: Private Group Messaging with Hidden Access Patterns

    Get PDF
    Talek is a private group messaging system that sends messages through potentially untrustworthy servers, while hiding both data content and the communication patterns among its users. Talek explores a new point in the design space of private messaging; it guarantees access sequence indistinguishability, which is among the strongest guarantees in the space, while assuming an anytrust threat model, which is only slightly weaker than the strongest threat model currently found in related work. Our results suggest that this is a pragmatic point in the design space, since it supports strong privacy and good performance: we demonstrate a 3-server Talek cluster that achieves throughput of 9,433 messages/second for 32,000 active users with 1.7-second end-to-end latency. To achieve its security goals without coordination between clients, Talek relies on information-theoretic private information retrieval. To achieve good performance and minimize server-side storage, Talek intro- duces new techniques and optimizations that may be of independent interest, e.g., a novel use of blocked cuckoo hashing and support for private notifications. The latter provide a private, efficient mechanism for users to learn, without polling, which logs have new messages
    corecore